Fully-Bayesian Functional Modeling of Continuous Glucose Monitoring Data

Joseph Sartini, PhD Candidate

Johns Hopkins University

Agenda

  • Modeling Continuous Glucose Monitor (CGM) Data

  • A Bayesian Functional Model for CGM Data

  • Model Performance

  • Model Inference on CGM

  • Extending our approach

Public Health Perspective: Diabetes

Estimates collated by the CDC using data from the National Heath and Nutrition Examination Survey.

  • 13.5% (40.1 million) in 20231

  • $412.9 billion in expenditures in 20222

  • Increasing complications: kidney failure, stroke, heart failure

Continuous Glucose Monitors (CGM)

Example CGM device and readout from NIDDK.

  • Estimates blood glucose every 1-15 minutes, \(\leq 15\) days

  • Recommended for diabetes management1

  • System response to food, exercise, etc.

DASH4D-CGM

  • Dietary Approaches to stop Hypertension for Diabetes
  • \(N = 89\) adults with type 2 diabetes

  • Randomized, crossover feeding study1

    • Meals from study center

    • DASH4D vs. Comparison

    • Wore blinded CGM

  • DASH4D reduces mean glucose, increases time in normal range2

  • How does DASH4D affect postprandial glucose response (PPGR)?

Example DASH4D-compliant meal from the study website.

CGM Data

\(65\) participants

\(65\) participants

\(768\) PPGR curves

  • \(3\)-\(15\) per subject

Summarizing PPGR

Mixed effects model at each time \(t\)

Linear Mixed Models

\[PPGR^{(60)}_{ij} = \beta_0^{(60)} + DASH4D_{ij} \times \boxed{\boldsymbol{\beta_1^{(60)}}} + (\ldots) + U_i^{(60)} + \epsilon_{ij}^{(60)}\]

Linear Mixed Models

\[PPGR^{(120)}_{ij} = \beta_0^{(120)} + DASH4D_{ij} \times \boxed{\boldsymbol{\beta_1^{(120)}}} + (\ldots) + U_i^{(120)} + \epsilon_{ij}^{(120)}\]

Linear Mixed Models

\[PPGR^{(t)}_{ij} = \beta_0^{(t)} + DASH4D_{ij} \times \boxed{\boldsymbol{\beta_1^{(t)}}} + (\ldots) + U_i^{(t)} + \epsilon_{ij}^{(t)}\]

Agenda

  • Modeling Continuous Glucose Monitor (CGM) Data

  • A Bayesian Functional Model for CGM Data

  • Model Performance

  • Model Inference on CGM

  • Extending our approach

Moving to a Functional Model

Sources of Structure:

  • Temporality

  • Temporality

  • Within subject

\[\boxed{PPGR_{ij}(t) = \beta_0(t) + DASH4D_{ij} \times \boldsymbol{\beta_1(t)} + (\ldots) + U_i(t) + \epsilon_{ij}(t)}\]

Modeling \(U_i(t)\)

Longitudinal LME Approach
\(U_i(t) = b_{i0} + b_{i1} \times t + \ldots\)
  • \(\{1, t, \ldots \}\): chosen functions

  • \(\{b_{i0}, b_{i1}, \ldots \}\): correlated RE

    • \((b_{i0}, b_{i1}, \ldots)^t \sim MVN(\mathbf{0}, \Sigma)\)
Functional PCA
\(U_i(t) = \sum_{k = 1}^K \xi_{ik} \phi_k(t)\)
  • \(\phi_k(t)\): data-driven functional PCs

  • \(\xi_{ik}\): scores

    • \(\xi_{ik} \sim N(0, \lambda_k)\) independent

A Functional Model for PPGR

\[PPGR_{ij}(t) = \beta_0(t) + DASH4D_{ij} \times \beta_1(t) + (\ldots) + \sum_{k = 1}^K \xi_{ik} \phi_k(t) + \epsilon_{ij}(t)\]

Standard approach1:

  1. Estimate \(\beta_p(t)\)

  2. Estimate \(\widehat{\phi}_k(t)\) from residuals

  3. Condition on splines, \(\widehat{\phi}_k(t)\)

  4. Fit linear mixed effects model

What about uncertainty in \(\widehat{\phi}_k(t)\)?

  1. Account for \(\widehat{\phi}_k(t)\) uncertainty?

  2. Changed \(\beta_p(t)\) inference?

  1. Subject-level inference?

Modeling \(\phi_k(t)\) as Parameters

Challenges modeling FPCs:

  • Constrained to be orthogonal (Stiefel manifold)

  • Maintain smoothness

Literature:

  • “A Geometric Approach to Maximum Likelihood Estimation of the Functional Principal Components From Sparse Longitudinal Data” (Peng & Paul, 2009)

  • “Generalized Multilevel Function-on-Scalar Regression and Principal Component Analysis” (Goldsmith et al., 2015)

  • “Monte Carlo Simulation on the Stiefel Manifold via Polar Expansion” (Jauch et al., 2021)

  • “Functional principal component models for sparse and irregularly spaced data by Bayesian inference” (Ye, 2023)

  • “Bayesian Functional Principal Components Analysis via Variational Message Passing with Multilevel Extensions” (Nolan et al., 2023)

The FAST Approach1

  1. \(\phi_k(t) = \mathbf{B}(t)\psi_k\) for orthonormal splines \(\mathbf{B}(t) = \{B_1(t), \ldots , B_Q(t)\}\)
  • \(\phi_k(t)\) are orthonormal \(\iff\) vectors \(\psi_k\) are orthonormal
  • Substantial dimension reduction

    • FPC Functions: \(\text{dim}(\phi_k(t)) = \infty\)

    • FPC Vectors: \(\text{dim}(\phi_k(t)) >> 100\)

    • FPC Spline Coefficients: \(\text{dim}(\psi_k) \in [20, 50]\)

  • Choice of basis \(\mathbf{B}(t)\) is crucial

    • Restrict to well-behaved \(\phi_k(t)\)

    • Shapes the \(\psi_k\) space

The FAST Approach Cont.

  1. Penalized spline priors1 controlling “wiggliness”

    • Add \(-h_k \int [\phi_k''(t)]^2 dt\) to log-likelihood

    • Unique smoothing parameters \(h_k\)

  1. Model \(\Psi = [\psi_1 | \ldots | \psi_K]\) using parameter expansion2

    • Sample unconstrained \(\mathbf{X}, X_{i,j} \sim N(0, 1)\)

    • Take SVD \(\mathbf{X} = \mathbf{U}\boldsymbol{\Sigma} \mathbf{V}^t\)

    • \(\mathbf{UV}^t\) is uniform on Stiefel manifold3

X = matrix(rnorm(100), 
           ncol = 10, nrow = 10)
SVD = svd(X)
Psi = SVD$u %*% t(SVD$v)

FAST FPC Prior

  • \(\int [\phi_k''(t)]^2 dt \approx \psi_k^t \mathbf{P} \psi_k\) for penalty matrix \(\mathbf{P}\) defined by \(\mathbf{B}(t)\)

\[f(\psi_k|h_k) \propto \text{MVN}\left(\mathbf{0}, (h_k\mathbf{P})^{-1}\right) \times \mathbf{I}(\psi_k \text{ are orthonormal})\]

  • \(h_k\) smoothing parameters have Gamma\((\alpha, \beta)\) priors

  • Smoothing spline prior1 with additional orthonormality constraint

Proposition: The joint prior distribution on \(\psi_k, h_k\) is proper if and only if \(2\beta\) is greater than the first eigenvalue of the penalty matrix \(\mathbf{P}\)

Agenda

  • Modeling Continuous Glucose Monitor (CGM) Data

  • A Bayesian Functional Model for CGM Data

  • Model Performance

  • Model Inference on CGM

  • Extending our approach

Simulation Validation

  • Real data and canonical examples

  • Similar/improved accuracy of point estimates

  • Nominal coverage of credible intervals

Computation

  • \(50-90\%\) reduction relative to alternatives
  • Our application: \(768\) functions with \(20\) obs: \(\approx 15\) minutes
  • Simulations

    • \(50\) functions with \(500\) obs: \(\approx 35\) minutes

    • \(500\) functions with \(50\) obs: \(\approx 15\) minutes

  • Scales linearly in functions and obs.

  • Efficiency improvements underway \(\rightarrow\)

Agenda

  • Modeling Continuous Glucose Monitor (CGM) Data

  • A Bayesian Functional Model for CGM Data

  • Model Performance

  • Model Inference on CGM

  • Extending our approach

DASH4D Diet Effect

Variability Decomposition

\[PPGR_{ij}(t) = \color{#619CFF}{\beta_0(t) + DASH4D_{ij} \times \beta_1(t) + (\ldots)} + \color{#00BA38}{\sum_{k = 1}^K \xi_{ik} \phi_k(t)} + \color{#F8766D}{\epsilon_{ij}(t)}\]

Variability Decomposition

\[PPGR_{ij}(t) = \beta_0(t) + DASH4D_{ij} \times \beta_1(t) + (\ldots) + \sum_{k = 1}^K \xi_{ik} \phi_k(t) + \epsilon_{ij}(t)\]

Variability Decomposition

\[PPGR_{ij}(t) = \beta_0(t) + DASH4D_{ij} \times \beta_1(t) + (\ldots) + \sum_{k = 1}^K \xi_{ik} \phi_k(t) + \epsilon_{ij}(t)\]

Why FAST for Functional Modeling?

  • Fully-Bayesian: estimates \(+\) uncertainty for any quantity of interest

  • Accounts for all known sources of correlation and uncertainty

    • Valid inferences
  • Computationally efficient and stable

  • Implemented in standard software (STAN)

dash_fit = bfmm(Glucose ~ DASH4D + Age + BMI + Gender + TOD, id = ID)

Generalization

Agenda

  • Modeling Continuous Glucose Monitor (CGM) Data

  • A Bayesian Functional Model for CGM Data

  • Model Performance

  • Model Inference on CGM

  • Extension to Multivariate, Sparse Data

The CONTENT Study1

  • Helicopactor pylori \(\Rightarrow\) child growth

  • May 2007 - Feb 2011 near Lima City, Peru

  • Longitudinal cohort of \(N = 197\) selected randomly from census

  • Length and weight measures2

    • Z-scored to age/gender WHO standards

    • Increasing sparsity of observation over time

    • Missing/cancelled visits

CONTENT Data

Can we (dynamically) infer growth trajectories?

CONTENT Prediction

Multivariate Functional PCA1

\[\begin{pmatrix}W_{i}(t)\\ L_i(t) \end{pmatrix} = \begin{pmatrix}\mu^{(W)}(t; X_{i})\\ \mu^{(L)}(t; X_{i}) \end{pmatrix} + \sum_{k = 1}^K \xi_{ik} \begin{pmatrix}\phi^{(W)}_{k}(t)\\ \phi^{(L)}_k(t) \end{pmatrix} + \begin{pmatrix}\epsilon^{(W)}_i(t) \\ \epsilon^{(L)}_i(t) \end{pmatrix}\]

  • \(\mu^{(W)}(t; X_i), \mu^{(L)}(t; X_i)\): variate-specific means

  • \(\xi_{ik} \sim N(0, \lambda_k)\): independent scores

  • \(\Phi_k(t) = \begin{pmatrix} \phi^{(W)}_k(t) \\ \phi^{(L)}_k(t) \end{pmatrix}\): joint, data-driven FPCs

  • \(\epsilon^{(W)}_i(t), \epsilon^{(L)}_i(t)\): independent, normal errors

Multivariate, Sparse FAST (MSFAST)

Adjustments:

  • Sparsity handled by splines

  • Concatenate the FPCs

  • Scale variates in pre-processing

  • More robust posterior FPC alignment

Simulations:

  • 2-4 variables

  • Compared to available software1

    • \(25\)-\(50\%\) error reduction

    • Accurate estimates

    • Competitive computation

  • Nominal coverage

Trajectory Prediction

  1. Learn \(\phi_k^{(p)}(t)\), \(\mu^{(p)}(t; X_i)\)

  2. New scores \(\xi_{ik}\) are conditionally Gaussian

Ongoing Work

References

American Diabetes Association Professional Practice Committee for Diabetes*. (2025). 1. Improving Care and Promoting Health in Populations: Standards of Care in Diabetes—2026. Diabetes Care, 49(Supplement_1), S13–S26. https://doi.org/10.2337/dc26-S001
Baumblatt, J., Fryar, C., Gu, Q., & Ashman, J. (2024). Prevalence of Total, Diagnosed, and Undiagnosed Diabetes in Adults: United States, August 2021–August 2023. National Center for Health Statistics (U.S.). https://doi.org/10.15620/cdc/165794
Checkley, W., Epstein, L. D., Gilman, R. H., Black, R. E., Cabrera, L., & Sterling, C. R. (1998). Effects of Cryptosporidium parvum infection in Peruvian children: Growth faltering and subsequent catch-up growth. American Journal of Epidemiology, 148(5), 497–506. https://doi.org/10.1093/oxfordjournals.aje.a009675
Checkley, W., Epstein, L. D., Gilman, R. H., Cabrera, L., & Black, R. E. (2003). Effects of acute diarrhea on linear growth in Peruvian children. American Journal of Epidemiology, 157(2), 166–175. https://doi.org/10.1093/aje/kwf179
Chikuse, Y. (2003). Statistics on Special Manifolds (P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg, I. Olkin, N. Wermuth, & S. Zeger, Eds.; Vol. 174). Springer. https://doi.org/10.1007/978-0-387-21540-2
Crainiceanu, C. M., Goldsmith, J., Leroux, A., & Cui, E. (2024). Functional Data Analysis with R. Chapman; Hall/CRC. https://doi.org/10.1201/9781003278726
Craven, P., & Wahba, G. (1979). Smoothing noisy data with spline functions. Numerische Mathematik, 1, 377–403.
Cui, E., Leroux, A., Smirnova, E., & Crainiceanu, C. M. (2022). Fast Univariate Inference for Longitudinal Functional Models. Journal of Computational and Graphical Statistics : A Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America, 31(1), 219–230. https://doi.org/10.1080/10618600.2021.1950006
Fang, M., Wang, D., Rebholz, C. M., Echouffo-Tcheugui, J. B., Tang, O., Wang, N.-Y., Mitchell, C. M., Pilla, S. J., Appel, L. J., & Selvin, E. (2025). DASH4D diet for glycemic control and glucose variability in type 2 diabetes: A randomized crossover trial. Nature Medicine, 1–8. https://doi.org/10.1038/s41591-025-03823-3
Goldsmith, J., Zipunnikov, V., & Schrack, J. (2015). Generalized Multilevel Function-on-Scalar Regression and Principal Component Analysis. Biometrics, 71(2), 344–353. https://doi.org/10.1111/biom.12278
Happ, C., & Greven, S. (2018). Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains. Journal of the American Statistical Association, 113(522), 649–659. https://doi.org/10.1080/01621459.2016.1273115
Jauch, M., Hoff, P. D., & Dunson, D. B. (2021). Monte Carlo Simulation on the Stiefel Manifold via Polar Expansion. Journal of Computational and Graphical Statistics, 30(3), 622–631. https://doi.org/10.1080/10618600.2020.1859382
Li, C., Xiao, L., & Luo, S. (2020). Fast covariance estimation for multivariate sparse functional data. Stat (International Statistical Institute), 9(1), e245. https://doi.org/10.1002/sta4.245
Nolan, T. H., Goldsmith, J., & Ruppert, D. (2023). Bayesian Functional Principal Components Analysis via Variational Message Passing with Multilevel Extensions. Bayesian Analysis, -1(-1), 1–27. https://doi.org/10.1214/23-BA1393
Nolan, T. H., Richardson, S., & Ruffieux, H. (2025). Efficient Bayesian functional principal component analysis of irregularly-observed multivariate curves. Computational Statistics & Data Analysis, 203, 108094. https://doi.org/10.1016/j.csda.2024.108094
Parker, E. D., Lin, J., Mahoney, T., Ume, N., Yang, G., Gabbay, R. A., ElSayed, N. A., & Bannuru, R. R. (2024). Economic Costs of Diabetes in the U.S. In 2022. Diabetes Care, 47(1), 26–43. https://doi.org/10.2337/dci23-0085
Peng, J., & Paul, D. (2009). A Geometric Approach to Maximum Likelihood Estimation of the Functional Principal Components From Sparse Longitudinal Data. Journal of Computational and Graphical Statistics, 18(4), 995–1015. https://doi.org/10.1198/jcgs.2009.08011
Pilla, S. J., Yeh, H.-C., et al. (2025). Dietary Patterns, Sodium Reduction, and Blood Pressure in Type 2 Diabetes: The DASH4D Randomized Clinical Trial. JAMA Internal Medicine. https://doi.org/10.1001/jamainternmed.2025.1580
Sartini, J., Zeger, S., & Crainiceanu, C. (2025). Bayesian Multivariate Sparse Functional PCA. arXiv. https://doi.org/10.48550/arXiv.2509.03512
Sartini, J., Zhou, X., Selvin, L., Zeger, S., & Crainiceanu, C. M. (2025). Fast bayesian functional principal components analysis. Journal of Computational and Graphical Statistics, 0(ja), 1–20. https://doi.org/10.1080/10618600.2025.2592768
Scheipl, F., Staicu, A.-M., & Greven, S. (2015). Functional Additive Mixed Models. Journal of Computational and Graphical Statistics : A Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America, 24(2), 477–501. https://doi.org/10.1080/10618600.2014.901914
Ye, J. (2023). Functional principal component models for sparse and irregularly spaced data by Bayesian inference. Journal of Applied Statistics, 1(31). https://doi.org/https://doi.org/10.1080/02664763.2023.2197587

Appendices

DASH4D Design

FAST Accuracy

FAST Coverage

Fast Univariate Inference1

  1. Fit local linear mixed effects models

  2. Smooth along the functional domain

  3. Point-wise confidence bands using the smoothing operator

  4. Joint confidence bands using analytic procedure or bootstrap

FAST vs FUI

Moving the needle for fasting glucose?

FAST Timing - N

FAST Timing - M

FASTer Description

  • Project \(Y_i(t) = \mathbf{B}(t) \eta_i\), only model \(\eta_i\)

  • Choose \(\mathbf{B}(t)\) to be vector orthonormal (matrix \(\mathbf{B}\))

    • \(\eta_i = (\mathbf{B}^T \mathbf{B})^{-1}\mathbf{B}^t Y_i = \mathbf{B}^t Y_i\), still independent
  • Computations should not scale with observations per function

Preliminary Results:

FASTer Accuracy

FASTer Coverage

MSFAST Trajectory Accuracy

MSFAST Trajectory Coverage

MSFAST FPC Accuracy

MSFAST FPC Coverage

Generalized Functional PCA

\[g\bigl(\mathbb{E}[Y_i(t)]\bigr) = \mu(t; X_{i}) + \sum_{k = 1}^K \xi_{ik}\phi_k(t)\]